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1. Introduction and Motivation 

Quantile regression (QR) is an increasingly important empirical tool in economics and 
other sciences for analyzing the impact of a set of regressors X on features of the conditional 
distribution of an outcome Y (sec Kocnkcr, 2005). In many applications the features of 
interest are the extremal or tail quantilcs of the conditional distribution. This paper provides 
practical tools for performing inference on these features using extremal QR and extreme 
value theory. The key problem we address is that conventional inference methods for QR, 
based on the normal distribution, are not valid for extremal QR. By using extreme value 
theory, which specifically accounts for the extreme nature of the tail data, we are able to 
provide inference methods that are valid for extremal QR. 

Before describing the contributions of this paper in more detail, we first motivate the use 
of extremal quantile regression in specific economic applications. Extremal quantile regres- 
sion provides a useful description of important features of the data in these applications, 

generating both reduced-form facts as well as inputs into estimation of structural models. 
In what follows, Qy{t\X) denotes the conditional r-quantilc of y given X] extremal condi- 
tional quantile refers to the conditional quantile function Qy{t\X) with the quantile index 
r = e or 1 — e, where e is close to zero; and extremal quantile regression refers to the quantile 
regression estimator of an extremal conditional quantile. 

A principal area of economic applications of extremal quantile regression is risk manage- 
ment. One example in this area is conditional value-at-risk analysis from financial economics 
(Chernozhukov and Umantsev 2001, Engle and Manganelli 2004). Here, we are interested 
in the extremal quantile Qy{(\X) of a return F to a bank's portfolio, conditional on various 
predictive variables X, such as the return to the market portfolio and the returns to portfo- 
lios of other related banks and mortgage providers. Unlike unconditional extremal quantiles, 
conditional extremal quantiles are useful for stress testing and analyzing the impact of ad- 
verse systemic events on the bank's performance. For example, we can analyze the impact 
of a large drop in the value of the market portfolio or of an associated company on the 
performance of the bank's portfolio. The results of this analysis arc useful for determining 
the level of capital that the bank needs to hold to prevent bankruptcy in unfavorable states 
of the world. Another example comes from health economics, where we are interested in 
the analysis of socio-economic determinants X of extreme quantiles of a child's birthweight 
Y or other health outcomes. In this example, very low birthweights are connected with 
substantial health problems for the child, and thus extremal quantile regression is useful to 
identifying which factors can improve these negative health outcomes. We shall return to 
these examples later in the empirical part of the paper. 
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Another primary area of economic applications of extremal quantile regression deals with 
describing approximate or probabilistic boundaries of economic outcomes conditional on 
pertinent factors. A first example in this area comes from efficiency analysis in the economics 
of regulation, where we are interested in the probabilistic production frontier Qy{^ — 
This frontier describes the level of production Y attained by the most productive (1 — e) x 
100 percent of firms, conditional on input factors X (Timmer 1971). A second example 
comes from the analysis of job search in labor economics, where we are interested in the 
approximate reservation wage Qy{£\X). This function describes the wage level, below 
which the worker accepts a job only with a small probability e, conditional on worker 
characteristics and other factors X (Flinn and Heckman 1982). A third example deals 
with estimating {S, s) rules in industrial organization and macroeconomics (Caballero and 
Engel 1999). Recall that the {S,s) rule is an optimal policy for capital adjustment, in 
which a firm allows its capital stock to gradually depreciate to a lower barrier, and once the 
barrier is reached, the firm adjusts its capital stock sharply to an upper barrier. Therefore, 
in a given cross-section of firms, the extremal conditional quantile functions (5y(e|X) and 
(5y(l — ^\X) characterize the approximate adjustment barriers for observed capital stock 
Y, conditional on a set of observed factors 

The two areas of applications described above are either non-structural or semi-structural. 
A third principal area of economic applications of extremal quantile regressions is structural 
estimation of economic models. For instance, in procurement auction models, the key infor- 
mation about structural parameters is contained in the extreme or near-extreme conditional 
quantiles of bids given bidder and auction characteristics (see e.g. Chernozhukov and Hong 
(2004) and Hirano and Porter (2003)). We then can estimate and test a structural model 
based on its ability to accurately reproduce a collection of extremal conditional quantiles 
observed in the data. This indirect inference approach is called the method-of-quantiles 
(Koenker 2005). We refer the reader to Donald and Paarsch (2002) for a detailed example 
of this approach in the context of using /c-sample extreme quantiles! 



Caballero and Engel (1999) study approximate adjustment barriers using distribution models; obviously 
quantile models can also be used. 

^In the previous examples, we can set e to to recover the exact, non- probabilistic, boundaries in the 
case with no unobserved heterogeneity and no (even small) outliers in the data. Our inference methods cover 
this exact extreme case, but we recommend avoiding it because it requires very stringent assumptions. 

''The method-of-quantiles allows us to estimate structural models both with and without parametric 
unobserved heterogeneity. Moreover, the use of near-extreme quantiles instead of exact-extreme quantiles 
makes the method more robust to a small fraction of outliers or neglected unobserved heterogeneity. 
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We now describe the contributions of this paper more specifically. This paper develops 
feasible and practical inferential methods based on extreme value (EV) theory for QR, 
namely, on the limit law theory for QR developed in Chernozhukov (2005) for cases where 
the quantile index r G (0, 1) is either low, close to zero, or high, close to 1. Without 
loss of generality we assume the former. By close to 0, we mean that the order of the 
r-quantile, rT, defined as the product of quantile index r with the sample size T, obeys 
tT — > < oo as T — > oo. Under this condition, the conventional normal laws, which 
are based on the assumption that tT diverges to infinity, fail to hold, and different EV 
laws apply instead. These laws approximate the exact finite sample law of extremal QR 
better than the conventional normal laws. In particular, we find that when the dimension- 
adjusted order of the r-quantile, rT/d, defined as the ratio of the order of the r-quantile 
to the number of regressors d, is not large, less than about 20 or 40, the EV laws may be 
preferable to the normal law, whereas the normal laws may become preferable otherwise. 
We suggest this simple rule of thumb for choosing between the EV laws and normal laws, 
and refer the reader to Section 5 for more refined suggestions and recommendations. 

Figure 1 illustrates the difference between the EV and normal approximations to the 
finite sample distribution of the extremal QR estimators. We plot the quantiles of these 
approximations against the quantiles of the exact finite sample distribution of the QR 
estimator. We consider different dimension- adjusted orders in a simple model with only 
one regressor, d = 1, and T = 200. If either the EV law or the normal law were to coincide 
with the true law, then their quantiles would fall exactly on the 45 degree line shown by 
the solid line. We see from the plot that when the dimension- adjusted order rT/d is 20 or 
40, the quantiles of the EV law are indeed very close to the 45 degree line, and in fact are 
much closer to this line than the quantiles of the normal law. Only for the case when the 
effective order rT/ d becomes 60, do the quantiles of the EV law and normal laws become 
comparably close to the 45 degree line. 

A major problem with implementing the EV approach, at least in its pure form, is its 
infeasibility for inference purposes. Indeed, EV approximations rely on canonical normal- 
izing constants to achieve non-degenerate asymptotic laws. Consistent estimation of these 
constants is generally not possible, at least without making additional strong assumptions. 
This difficulty is also encountered in the classical non-regression case; see, for instance, 
Bertail, Haefke, Politis, and White (2004) for discussion. Furthermore, universal infer- 
ence methods such as the bootstrap fail due to the nonstandard behavior of extremal QR 
statistics; see Bickel and Freedman (1981) for a proof in the classical non-regression case. 
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A. T = .025, T = 200, tT = 5 B. x = .2, T = 200, xT = 40 C. x = .3, T = 200, xT = 60 




Figure 1. Quantiles of the true law of QR vs. quantiles of EV and 
normal laws. The figure is based on a simple design with Y = X + U , 

where U follows a Cauchy distribution and X = 1. The solid line " 

" shows the actual quantiles of the true distribution of QR with quantile 
index r G {.025, .2, .3}. The dashed line "- - -" shows the quantiles of 

the conventional normal law for QR, and the dotted line " " shows the 

quantiles of EV law for QR. The figure is based on 10,000 Monte Carlo 
replications and plots quantiles over the 99% range. 

Conventional subsampling methods with and without replacement are also inconsistent be- 
cause the QR statistic diverges in the unbounded support case. Moreover, they require 
consistent estimation of normalizing constants, which is not feasible in general. 

In this paper we develop two types of inference approaches that overcome all of the diffi- 
culties mentioned above: a resampling approach and an analytical approach. We favor the 
first approach due to its ease of implementation in practice. At the heart of both approaches 
is the use of self-normalized QR (SN-QR) statistics that employ random normalization fac- 
tors, instead of generally infeasible normalization by canonical constants. The use of SN-QR 
statistics allows us to derive feasible limit distributions, which underlie either of our infer- 
ence approaches. Moreover, our resampling approach is a suitably modified subsampling 
method applied to SN-QR statistics. This approach entirely avoids estimating not only 
the canonical normalizing constants, but also all other nuisance tail parameters, which in 
practice may be difficult to estimate reliably. Our construction fruitfully exploits the special 
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relationship between the rates of convergence/divergence of extremal and intermediate QR 
statistics, which allows for a valid estimation of the centering constants in subsamples. For 
completeness we also provide inferential methods for canonically-normalized QR (CN-QR) 
statistics, but we also show that their feasibility requires much stronger assumptions. 

The remainder of the paper is organized as follows. Section 2 describes the model and 
regularity conditions, and gives an intuitive overview of the main results. Section 3 estab- 
lishes the results that underlie the inferential procedures. Section 4 describes methods for 
estimating critical values. Section 5 compares inference methods based on EV and normal 
approximations through a Monte Carlo experiment. Section 6 presents empirical examples, 
and the Appendix collects proofs and all other figures. 



2.1. Some Basics. Let a real random variable Y have a continuous distribution function 
^Y{y) = Pr[^ < y]- A T-quantile of Y is (5y(r) = inf{y : Fy(y) > r} for some r e (0, 1). 
Let X be a vector of covariates related to Y, and Fy(y|x) = Pr[y < y\X = x] denote 
the conditional distribution function of Y given X = x. The conditional r-quantile of Y 
given X = a; is Qy{t\x) = inf{y : Fyiylx) > r} for some r G (0, 1). We refer to Qy{t\x), 
viewed as a function of x, as the r-quantile regression function. This function measures the 
effect of covariates on outcomes, both at the center and at the upper and lower tails of the 
outcome distribution. A conditional r-quantile is extremal whenever the probability index 
r is either low or high in a sense that we will make more precise below. Without loss of 
generality, we focus the discussion on low quantiles. 

Consider the classical linear functional form for the conditional quantile function of Y 
given X = x: 



and for every a; G X, the support of X. This linear functional form is flexible in the sense 
that it has good approximation properties. Indeed, given an original regressor X*, the final 
set of regressors X can be formed as a vector of approximating functions. For example, X 
may include power functions, splines, and other transformations of X*. 



2. The Set Up and Overview of Results 



Qy{t\x) = x'P{t), for all r e X = {0,r]], for some rj G (0, 1], 



(2.1) 



Given a sample of T observations {Yt, Xt, t = 1, T}, the r-quantile QR estimator (3{t) 
solves: 



T 




(2.2) 
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where Pt{u) = (r — < 0))n is the asymmetric absolute deviation function. The median 
r = 1/2 case of ()2.2p was introduced by Laplace (1818) and the general quantile formulation 
(fO) by Koenker and Bassett (1978). 

QR coefficients /3(r) can be seen as order statistics in the regression setting. Accordingly, 
we will refer to tT as the order of the r-quantile. A sequence of quantile index-sample size 
pairs {tt,T}'t=i is said to be an extreme order sequence if \ and ryT ^ A; G (0, oo) as 
T — > oo; an intermediate order sequence if tt \ and r^T — > cxd as T ^ oo; and a central 
order sequence if tt is fixed asT ^ oo. Each type of sequence leads to different asymptotic 
approximations to the finite-sample distribution of the QR estimator. The extreme order 
sequence leads to an extreme value (EV) law in large samples, whereas the intermediate 
and central sequences lead to normal laws. As we saw in Figure 1, the EV law provides a 
better approximation to the finite sample law of the QR estimator than the normal law. 

2.2. Pareto-type or Regularly Varying Tails. In order to develop inference theory for 
extremal QR, we assume the tails of the conditional distribution of the outcome variable 
have Pareto-type behavior, as we formally state in the next subsection. In this subsection, 
we recall and discuss the concept of Pareto-type tails. The (lower) tail of a distribution 
function has Pareto-type behavior if it decays approximately as a power function, or more 
formally, a regularly varying function. The tails of the said form are prevalent in economic 
data, a. discovered by V. Pareto .n 18950 Pa.etc.type tails encompass or approximate 
a rich variety of tail behavior, including that of thick-tailed and thin-tailed distributions, 
having either bounded or unbounded support. 

More formally, consider a random variable Y and define a random variable U asU = Y , 
if the lower end-point of the support of Y is — oo, or [/ = Y — (0), if the lower end-point 
of the support of Y is Qy{^) > — oo. The quantile function of U, denoted by Qu, then has 
lower end-point (5r/(0) = — oo or Qu{0) = 0. The assumption that the quantile function Qu 
and its distribution function Fjj exhibit Pareto-type behavior in the tails can be formally 
stated as the following two equivalent conditionsO 

Qu{r) ~ L(t)-t-« asr\0, (2.3) 
Fuiu) ~ Z(n)-n-i/« asu\Qc/(0), (2.4) 

^ Pareto called the tails of this form "A Distribution Curve for Wealth and Incomes." Further empirical 
substantiation has been given by Sen (1973), Zipf (1949), Mandelbrot (1963), and Fama (1965), among 
others. The mathematical theory of regular variation in connection to extreme value theory has been 
developed by Karamata, Gnedenko, and de Haan. 

^ The notation a ~ 6 means that a/b 1 as appropriate limits are taken. 
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for some real number ^ 0, where L(t) is a nonparametric slowly- varying function at 
0, and L{u) is a nonparametric slowly- varying function at <5c7(0)@ The leading examples 
of slowly- varying functions are the constant function and the logarithmic function. The 
number ^ defined in (|2.3|) and (|2.3|) is called the EV index. 

The absolute value of measures the heavy-tailedness of the distribution. A distribution 
Fy with Pareto-type tails necessarily has a finite lower support point if ^ < and a infinite 
lower support point if ^ > 0. Distributions with ^ > include stable, Pareto, Student's t, 
and many other distributions. For example, the t distribution with u degrees of freedom 
has = l/u and exhibits a wide range of tail behavior. In particular, setting = 1 yields 
the Cauchy distribution which has heavy tails with ^ = 1, while setting v = 30 gives a 
distribution which has light tails with ^ = 1/30, and which is very close to the normal 
distribution. On the other hand, distributions with ^ < include the uniform, exponential, 
Weibull, and many other distributions. 

It should be mentioned that the case of = corresponds to the class of rapidly varying 
distribution functions. These distribution functions have exponentially light tails, with the 
normal and exponential distributions being the chief examples. To simplify the exposition, 
we do not discuss this case explicitly. However, since the limit distributions of the main 
statistics are continuous in ^, including at = 0, inference theory for the case of ^ = is 
also included by taking ^ — > 0. 

2.3. The Extremal Conditional Quantile Model and Sampling Conditions. With 
these notions in mind, our main assumption is that the response variable Y, transformed 
by some auxiliary regression line, X' fie, has Pareto-type tails with EV index ^. 

CI. The conditional quantile function ofY given X = x satisfies equation h2.1\) a.s. More- 
over, there exists an auxiliary extremal regression parameter fie G M*^, such that the dis- 
turbance V = Y — X'Pe has end-point s = or s = —oo a.s., and its conditional quantile 
function Qv{t\x) satisfies the following tail- equivalence relationship: 

Qv{t\x) ~ x'7 • Qu{t), as t \0, uniformly in x G'K CI R'^, 

for some quantile function QuiT) that exhibits Pareto-type tails with EV index (i.e., it 
satisfies I12.3\) ). and some vector parameter 7 such that E[X]'j = 1. 

Since this assumption only affects the tails, it allows covariates to affect the extremal 
quantile and the central quantiles very differently. Moreover, the local effect of covariates 



'A function u >-* L(u) is said to be slowly- varying at s if lim;\_s[L(/)/L(m/)] = 1 for any m > 0. 
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in the tail is approximately given by (3{t) ^ + jQuiT), which allows for a differential 
impact of covariates across various extremal quantiles. 

C2. The conditional quantile density function dQv{T\x) / dr exists and satisfies the tail 
equivalence relationship dQv{T\x) / dr ~ x'^ ■ dQu{T) / dr as t \, 0, uniformly m x G X, 
where dQuij) / dr exhibits Pareto-type tails as t \ with EV index ^ + 1. 

Assumption C2 strengthens CI by imposing the existence and Pareto-type behavior of 
the conditional quantile density function. We impose C2 to facilitate the derivation of the 
main inferential results. 

The following sampling conditions will be imposed. 

C3. The regressor vector X = (1, Z')' is such that it has a compact support X, the matrix 
E[XX'\ is positive definite, and its distribution function Fx satisfies a non-lattice condition 
stated in the mathematical appendix (this condition is satisfied, for instance, when Z is 
absolutely continuous). 

Compactness is needed to ensure the continuity and robustness of the mapping from 
extreme events in Y to the extremal QR statistics. Even if X is not compact, we can 
select the data for which X belongs to a compact region. The non-degeneracy condition of 
E[XX'] is standard and guarantees invertibility. The non-lattice condition is required for 
the existence of the finite-sample density of QR coefficients. It is needed even asymptotically 
because the asymptotic distribution theory of extremal QR closely resembles the finite- 
sample theory for QR, which is not a surprise given the rare nature of events that have a 
probability of order 1/T. 

We assume the data are either i.i.d. or weakly dependent. 

C4. The sequence {Wt} with Wt = iyt,Xt) and Vt defined in CI, forms a stationary, 
strongly mixing process with a geometric mixing rate, that is, for some C > 

sup sup \P{A DB)- P{A)P{B)\exp{Cm) ^ as m ^ oo, 

m 

where At = a{Wt,Wt-i, ■■■) and Bf = a{Wt,Wt-\-i, ■■.)■ Moreover, the sequence satisfies a 
condition that curbs clustering of extreme events in the following sense: P{Vt < K, Vt+j < 
K\At) < CP{Vt < K\At)'^ for all K G [s,-^], uniformly for all j >1 and uniformly for all 
t>\; here C > and K > s are some constants. 

A special case of this condition is when the sequence of variables {{Vt,Xt),t > 1}, or 
equivalently {{Yt,Xt),t > 1}, is independent and identically distributed. The assumption 
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of mixing for {{Vt,Xt),t > 1} is standard in econometric analysis (White 1990), and it is 
equivalent to the assumption of mixing of {{Yt, Xt),t > 1}. The non-clustering condition is 
of the Meyer (1973)-type and states that the probability of two extreme events co-occurring 
at nearby dates is much lower than the probability of just one extreme event. For example, 
it assumes that a large market crash is not likely to be immediately followed by another large 
crash. This assumption leads to limit distributions of QRs as if independent sampling had 
taken place. The plausibility of the non-clustering assumption is an empirical matter. We 
conjecture that our primary inference method based on subsampling is valid more generally, 
under conditions that preserve the rates of convergence of QR statistics and ensure existence 
of their asymptotic distributions. Finally we note that the assumptions made here could be 
relaxed in certain directions for some of the results stated below, but we decided to state a 
single set of sufficient assumptions for all the results. 

2.4. Overview and Discussion of Inferential Results. Wc begin by briefly revisiting 
the classical non-regression case to describe some intuition and the key obstacles to per- 
forming feasible inference in our more general regression case. Then we will describe our 
main inferential results for the regression case. It is worth noting that our main inferential 
methods, based on self-normalized statistics, are new and of independent interest even in 
the classical non-regression case. 

Recall the following classical result on the limit distribution of the extremal sample 
quantiles Qy{t) (Gnedenko 1943): for any integer A; > 1 and r = k/T, as T — oo, 

Mk) = ATiQyir) - Qy{t)) Z^{k) = F'^ - fc'^ (2.5) 

where 

At = l/Q{7(l/r), Tk = Si + ... + Sk, (2.6) 

and (^1,^2, ■••) is an independent and identically distributed sequence of standard exponen- 
tial variables. We refer to Zxik) as the canonically normalized (CN) statistic because it 
depends on the scaling constant At- The variables F^, entering the definition of the EV dis- 
tribution, arc gamma random variables. The limit distribution of the k-th order statistic is 
therefore a transformation of a gamma variable. The EV distribution is not symmetric and 
may have significant (median) bias; it has finite moments if ^ < and has finite moments 
of up to order 1/^ if ^ > 0. The presence of median bias motivates the use of median-bias 
correction techniques, which we discuss in the regression case below. 
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Although very powerful, this classical result is not feasible for purposes of inference 
on Qy{t), since the scaling constant At is generally not possible to estimate consis- 
tently (Bertail, Haefke, Politis, and White 2004). One way to deal with this problem 
is to add strong parametric assumptions on the non-parametric, slowly-varying function 
L(-) in equation (j2.3p in order to estimate At consistently. For instance, suppose that 
Qu{t) Lt^^. Then one can estimate ^ by the classical Hill or Pickands estimators, and L 
hy L = {Qy{2t) — Qy{t)))/{2^^ — l)r~^). We develop the necessary theoretical results for 
the regression analog of this approach, although we will not recommend it as our preferred 
method. 

Our preferred and main proposal to deal with the aforementioned infeasibility problem 
is to consider the asymptotics of the self-normalized (SN) sample quantiles 

ZT{k) = ATiQyir) - Qy{t)) Z^{k) = -^-^ (2.7) 



mk k 



where for m > 1 such that mk is an integer. 



At = ^ . (2.8) 

QY{mT) - Qy{t) 

Here, the scaling factor At is completely a function of data and therefore feasible. Moreover, 
we completely avoid the need for consistent estimation of At- This is convenient because 
we are not interested in this normalization constant per se. The limit distribution in (12. 7p 
only depends on the EV index ^, and its quantiles can be easily obtained by simulation. In 
the regression setting, where the limit law is a bit more complicated, we develop a form of 
subsampling to perform both practical and feasible inference. 

Let us now turn to the regression case. Here, we can also consider a canonically- 
normalized QR statistic (CN-QR): 



ZT{k) := AT[Pir) - Pir)) for At := l/Qu{l/T)- (2.9) 
and a self-normalized QR (SN-QR) statistic: 

ZT{k) := AT(m - P{r)) for At := ^ , (2.10) 

where Xt = Ylt=i Xt/T and m is a real number such that rT(m — 1) > d. The first statistic 
uses an infeasible canonical normalization At, whereas the second statistic uses a feasible 
random normalization. First, we show that 

ZTik) -^d Zooik) (2.11) 
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where for % = 1 if ^ < and x = — 1 if > 0, 

oo 

ZUk) := X ■ argmin - kE[X^ [z + k-^^) + + k'^^) - x ■ ■ Kl)-^ 

(2.12) 

where {ri,r2, ...} := {£i,£i +S2, •••}; {£1,^2, •••} is an iid sequence of exponential variables 
that is independent of {Xi,X2, ■■■}, an iid sequence with distribution Fx; and {y}+ := 
max(0, y). Furthermore, we show that 

Zrik) Zoo{k) := ^ VkZ^{k) _ 

E[X\{Z^{mk)-Z^{k)) + x-{m-^-l)k-^ 

The limit laws here are more complicated than in the non-regression case, but they share 
some common features. Indeed, the limit laws depend on the variables Fj in a crucial 
way, and are not necessarily centered at zero and can have significant first order median 
biases. Motivated by the presence of the first order bias, we develop bias corrections for 
the QR statistics in the next section. Moreover, just as in the non-regression case, the limit 
distribution of the CN-QR statistic in (|2.12p is generally infeasible for inference purposes. 
We need to know or estimate the scaling constant At, which is the reciprocal of the extremal 
quantile of the variable U defined in CI. That is, we require an estimator At such that 
At/ At — >p 1, which is not feasible unless the tail of U satisfies additional strong parametric 
restrictions. We provide additional restrictions below that facilitate estimation of At and 
hence inference based on CN-QR, although this is not our preferred inferential method. 

Our main and preferred proposal for inference is based on the SN-QR statistic, which does 
not depend on At- We estimate the distribution of this statistic using either a variation 
of subsampling or an analytical method. A key ingredient here is the feasible normalizing 
variable At, which is randomly proportional to the canonical normalization At, in the sense 
that At/ At is a random variable in the limit An advantage of the subsampling method 
over the analytical methods is that it does not require estimation of the nuisance parameters 
^ and 7. Our subsampling approach is different from conventional subsampling in the 
use of recentering terms and random normalization. Conventional subsampling that uses 
recentering by the full sample estimate /3(t) is not consistent when that estimate is diverging; 
and here we indeed have At when ^ > 0. Instead, we recenter by intermediate order 



'''The idea of feasible random normalization has been used in other contexts (e.g. t-statistics) . In extreme 
value theory, Dekkers and de Haan (1989) applied a similar random normalization idea to extrapolated 
quantile estimators of intermediate order in the non-regression setting, precisely to produce limit distributions 
that can be easily used for inference. In time series, Kiefer, Vogelsang, and Bunzel (2000) have used feasible 
inconsistent estimates of the variance of cisymptotically normal estimators. 
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QR estimates in subsamples, which wih diverge at a slow enough speed to estimate the Umit 
distribution of SN-QR consistently. Thus, our subsampling approach explores the special 
relationship between the rates of convergence/divergence of extremal and intermediate QR 
statistics and should be of independent interest even in a non-regression setting. 

This paper contributes to the existing literature by introducing general feasible inference 
methods for extremal quantile regression. Our inferential methods rely in part on the limit 
results in Chernozhukov (2005), who derived EV limit laws for CN-QR under the extreme 
order condition tT k > 0. This theory, however, did not lead directly to any feasible, 
practical inference procedure. Feigin and Resnick (1994), Chernozhukov (1998), Portnoy 
and Jureckova (1999), and Knight (2001) provide related limit results for canonically nor- 
malized linear programming estimators where tT \ 0, all in different contexts and at 
various levels of generality. These limit results likewise did not provide feasible inference 
theory. The linear programming estimator is well suited to the problem of estimating finite 
deterministic boundaries of data, as in image processing and other technometric applica- 
tions. In contrast, the current approach of taking tT k > Ois more suited to econometric 
applications, where interest focuses on the "usual" quantiles located near the minimum or 
maximum and where the boundaries may be unlimited. However, some of our theoretical 
developments are motivated by and build upon this previous literature. Some of our proofs 
rely on the elegant epi-convergence framework of Geyer (1996) and Knight (1999). 

3. Inference and Median- Unbiased Estimation Based on Extreme Value Laws 

This section establishes the main results that underlie our inferential procedures. 

3.1. Extreme Value Laws for CN-QR and SN-QR Statistics. Here we verify that 
that the CN-QR statistic Zxik) and SN-QR statistic Zxik) converge to the limit variables 
Z^{k) and Z^oik), under the condition that tT — A; > as T — go. 

Theorem 1 (Limit Laws for Extremal SN-QR and CN-QR). Suppose conditions CI, C3 
and C4 hold. Then as tT k > and T — >■ oo, (1) the SN-QR statistic of order k obeys 

Zrik) Zoo{k), 

for any m such that k{m — 1) > d, and (2) the CN-QR statistic of order k obeys 

Zrik) Zooik). 

Comment 3.1. The condition that k{m — 1) > d in the definition of SN-QR ensures that 
f3{mT) / /?(r) and therefore the normalization by At is well defined. This is a consequence 
of Theorem 3.2 in Bassett and Koenker (1982) and existence of the conditional density of 
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Y imposed in assumption C2. Result 1 on SN-QR statistics is the main new result that we 
will exploit for inference. Result 2 on CN-QR statistics is needed primarily for auxiliary 
purposes. Chernozhukov (2005) presents some extensions of result 2. 

Comment 3.2. When Qy (0|x) > — oo, by CI Qy{0\x) is equal to x'Pe and is the conditional 
lower boundary of Y. The proof of Theorem 1 shows that 

AT{P{T)-/3e) -^d Zoc{k) := Z^{k)-k-^ and AT0{T)-/3e) -^d Z^{k) / {Z^{mk)-Z^{k)). 

We can use these results and analytical and subsampling methods presented below to per- 
form median unbiased estimation and inference on the boundary parameter 

3.2. Generic Inference and Median-Unbiased Estimation. We outline two proce- 
dures for conducting inference and constructing asymptotically median unbiased estimates 
of linear functions tp' I3{t) of the coefficient vector /3(t), for some non-zero vector ip. 

1. Median-Unbiased Estimation and Inference Using SN-QR. By Theorem 1, 
ip'AriPiT) — /3(t)) ip'Zooik). Let Ca denote the a-quantile of ip'Zooik) for < a < .5. 
Given Cq,, a consistent estimate of Cq,, we can construct an asymptotically median- unbiased 
estimator and a (1 — a)%-confidence interval for ip' f3{T) as 

iI^'P{t)-ci/2/At and [^'/3(t) - Ci_„/2Mt, V-'^M - c„/2Mt], 

respectively. The bias-correction term and the limits of the confidence interval depend on 
the random scaling At- We provide consistent estimates of Cq in the next section. 

Theorem 2 (Inference and median-unbiased estimation using SN-QR). Under the condi- 
tions of Theorem 1, suppose we have Ca such that Ca -^p Ca- Then, 

hm P{V'^(r) -C1/2MT < i^'Pir)} = 1/2 

and 

^lim^P{V'^(T) - Ci_,/2Mt < V'/3(t) < V'^(t) - c,/2Mt} = 1 - a. 

2. Median Unbiased Estimation and Inference Using CN-QR. By Theorem 1, 
iP'At{P{t) — P{t)) -^4 ijj' Zaoik). Let c'„ denote the o-quantile of ijj' Z^oik) for < a < .5. 
Given At, a consistent estimate of At, and c^, a consistent estimate of c^, we can construct 

® To estimate the critical values, we can use either analytical or subsampling methods presented below, 
with the difference that in subsampling we need to recenter by the full sample estimate /3e = I3(l/T). 
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an asymptotically median-unbiased estimator and a (1 — a)%-confidence interval for ifj' (3{t) 
as 

V^'^lr) - T^yJAT and W^ir) - A-.j-il^T. ^'^r) - ^^/^/M, 

respectively. 

As mentioned in Section 2, construction of consistent estimates of At requires additional 
strong restrictions on the underlying model as well as additional steps in estimation. For 
example, suppose the nonparametric slowly varying component L{t) of At is replaced by 
a constant L, i.e. suppose that as r \ 

1/Qu{t) = L • • (1 + 6{t)) for some L G M, where 6{t) 0. (3.14) 

We can estimate the constants L and via Pickands-type procedures: 

; , i„ xmirr) - dirr)) g ^ Kmrr) - PM) 
l>i2 Xi.{l3{2TT') - DM) (2-(-l)-T-i 

where tt is chosen to be of an intermediate order, ttT oo and tt 0. Theorem 4 in 
Chernozhukov (2005) shows that under C1-C4, condition (13.14p . and additional conditions 
on the sequence ((5(rT),TT)@ C = C + o(l/lnT) and L L, which produces the required 
consistent estimate At = L{l/T)~^ such that At /At -^p 1. These additional conditions 
on the tails of Y and on the sequence {S{tt),tt) highlight the drawbacks of this inference 
approach relative to the previous one. 

We provide consistent estimates of in the next section. 

Theorem 3 (Inference and Median-Unbiased Estimation using CN-QR). Assume the con- 
ditions of Theorem 1 hold. Suppose that we have At such that At /At — >p 1 and such 
that Ca — >p c'„ . Then, 

hm P{^'P{T)-^yJAT < V'/?(r)} = 1/2 

and 

hm P{V'^(r) - Ci_„/2/1t < V''/3(t) < V'^(t) - c^/s/It} = 1 - a. 



The rate convergence of ^ is max[ 



\n5[TT)], which gives the following condition on the sequence 



{5{tt),tt) ■■ max[^^, In 5(rT)] = o(l/lnT). 
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4. Estimation of Critical Values 

4.1. Subsampling-Based Estimation of Critical Values. Our resampling method for 
inference uses subsamples to estimate the distribution of SN-QR, as in standard subsam- 
pling. However, in contrast to the subsampling, our method bypasses estimation of the 
unknown convergence rate At by using self- normalized statistics. Our method also employs 
a special recentering that allows us to avoid the inconsistency of standard subsampling due 
to diverging QR statistics when > 0. 

The method has the following steps. First, consider all subsets of the data {Wt = {Yt,Xt), 
t = 1, T} of size 6; if {TVt} is a time series, consider Bt = T — h + 1 subsets of size h of the 
form {Wi, VFj+b_i}. Then compute the analogs of the SN-QR statistic, denoted V^^t and 
defined below in equation (j4.17p . for each i-th subsample for i = 1, ...,Bj.. Second, obtain 
Ca as the sample a-quantile of {Vi_h,Ti'i = li ■■■,Bx}- In practice, a smaller number Bt of 
randomly chosen subsets can be used, provided that Bt oo as T ^ oo. (See Section 
2.5 in Politis, Romano, and Wolf (1999).) Politis, Romano, and Wolf (1999) and Bertail, 
Haefke, Politis, and White (2004) provide rules for the choice of subsample size b. 

The SN-QR statistic for the full sample of size T is: 

Vt := ATi^'iPArr) - PiTr)) for At = . .^^ ^ ^. , (4.16) 

X^[P{mTT) - p{tt)) 

where we can set m = (d + p)/(ttT) + 1 = {d + p)/k + 1 + o(l), where p > 1 is the spacing 
parameter, which we set to 5o In this section we write to emphasize the theoretical 
dependence of the quantile of interest r on the sample size. In each i-th subsample of size 
6, we compute the following analog of Vt- 

Vi,b,T ■■= Ab,Ti^\(3i,b,Tirt) - Pin)) for A^,b,T ■= ^- — ^ — -, (4.17) 

^i,b,T{f^i,b,T{'>T^n) - Pi,b,Tin)) 

where /9(r) is the r-quantile regression coefficient computed using the full sample, Pifi^Ti^) is 
the r-quantile regression coefficient computed using the i-th subsample, Xi^b,T is the sample 
mean of the regressors in the ith subsample, and Tb := (r^T) The determination of 

n is a critical decision that sets apart the extremal order approximation from the central 
order approximation. In the latter case, one sets n = Tt in subsamples. In the extreme 



"'^ '^Variation of this parameter from p = 2 to p = 20 yielded similar results in our Monte-Carlo experiments. 
In practice, it is reasonable to use the following finite-sample adjustment to Tb: rt — min[(rTT) /b, .2] if 
TT < .2, and rj, — tt if tt > .2. The idea is that tt is judged to be non-extremal if tt > .2, and the subsam- 
pling procedure reverts to central order inference. The truncation of Tb by .2 is a finite-sample adjustment 
that restricts the key statistics Vi^b,T to be extremal in subsamples. These finite-sample adjustments do not 
affect the asymptotic arguments. 
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order approximation, our choice of Tb gives the same extreme order of Tbb in the subsample 
as the order of ttT in the full sample. 

Under the additional parametric assumptions on the tail behavior stated earlier, we can 
estimate the quantiles of the limit distribution of CN-QR using the following procedure: 
First, create subsamples i = 1, •••,-Bt as before and compute in each subsample: Vi^b,T ■= 
-^bi'' {l3i,b,T{Tb) — P{Tb))-, where is any consistent estimate of A^. For example, under the 
parametric restrictions specified in (|3.14p . set Ab = Lb^^ for L and ^ specified in (|3.15p . 
Second, obtain c'^ as the a-quantile of {Vi^b,T, ^ = •••) Bt}- 

The following theorems establish the consistency of and : 

Theorem 4 (Critical Values for SN-QR by Resampling). Suppose the assumptions of The- 
orems 1 and 2 hold, b/T — > 0, 6 ^ oo, T ^ oo and Bt oo. Then Ca — >p c^. 

Theorem 5 (Critical Values for CN-QR by Resampling). Suppose the assumptions of The- 
orems 1 and 2 hold, b/T — > 0, 6 — > oo,r oo, Bt — > oo, and Ab is such that Ab/Ab 1. 

Comment 4.1. Our subsampling method based on CN-QR or SN-QR produces consistent 
critical values in the regression case, and may also be of independent interest in the non- 
regression case. Our method differs from conventional subsampling in several respects. 
First, conventional subsampling uses fixed normalizations At or their consistent estimates. 
In contrast, in the case of SN-QR we use the random normalization At, thus avoiding 
estimation of At- Second, conventional subsampling recenters by the full sample estimate 
P{tt)- Recentering in this way requires Ab/AT — > for obtaining consistency (see Theorem 
2.2.1 in Politis, Romano, and Wolf (1999)), but here we have Ab/AT oo when > 0. 
Thus, when ^ > the extreme order QR statistics I3{tt) diverge when ^ > 0, and the 
conventional subsampling is inconsistent. In contrast, to overcome the inconsistency, our 
approach instead uses P{Tb) for recentering. This statistic itself may diverge, but because it 
is an intermediate order QR statistic, the speed of its divergence is strictly slower than that 
of At- Hence our method of recentering exploits the special structure of order statistics in 
both the regression and non-regression cases. 

4.2. Analytical Estimation of Critical Values. Analytical inference uses the quantiles 
of the limit distributions found in Theorem 1. This approach is much more demanding in 
practice than the previous subsampling methodic 

-'^^The method developed below is also of independent interest in situations where the limit distributions 
involve Poissson processes with unknown nuisance parameters, as, for example, in Chernozhukov and Hong 
(2004). 
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Define the following random vector: 



^Lik) = X- argmin 



t=i 



(4.18) 



for some consistent estimates and 7, e.g., those given in equation (|4.19|) : where x = 1 if 
^ < and X = — 1 if ? > 0, {Fi, ...} = {£i,£i + £2, ■■■}', {£i,£2, ■■■} is an i.i.d. sequence 
of standard exponential variables; {Xi,X2-, •••} is an i.i.d. sequence with distribution func- 
tion Fx, where Fx is any smooth consistent estimate of Fx, e.g., a smoothed empirical 
distribution function of the sample {Xi,i = 1, T}J^ Moreover, the sequence {Xi,X2, ...} 
is independent from {£i,£2, ■■■}■ Also, let Z^{k) = VkZ^{k)/[X^{Z^{mk) — Z^{k)) + 
x{Tn~^ — l)k^^]. The estimates and are obtained by taking a-quantiles of the vari- 
ables ijj' Zl^{k) and ip' Z^{k), respectively. In practice, these quantiles can only be evaluated 
numerically as described below. 

The analytical inference procedure requires consistent estimators of ^ and 7. Theorem 
4.5 of Chernozhukov (2005) provides the following estimators based on Pickands-type pro- 
cedures: 

f = ZL In ■^T(g(4rT) - dirr)) ^ g(2r.) - %) 

ln2 X't,{P{1tt) - DM) X'^mrr) - I31tt)) 

which is consistent if r^T — > 00 and — > 0. 

Theorem 6 (Critical Values for SN-QR by Analytical Method). Assume the conditions of 
Theorem 1 hold. Then for any estimators of the nuisance parameters such that ^ -^p and 
7 -^p 7; have that Ca -^p Ca- 

Theorem 7 (Critical Values for CN-QR by Analytical Method). Assume the conditions of 
Theorem 1 hold. Then, for any estimators of the nuisance parameters such that ^ — >p ^ and 
7 ~^p 1; have that '3^ -^p c^. 

Comment 4.2. Since the distributions of Zoo{k) and Zoo{k) do not have closed form, 
except in very special cases, and can be obtained numerically via the following Monte 
Carlo procedure. First, for each i = 1,...,B compute Z*^{k) and Z*^{k) using formula 
(j4.18p by simulation, where the infinite summation is truncated at some finite value M. 



^"^We need smoothness of the distribution regressors Xi to guarantee uniqueness of the solution of the 
optimization problem (|4.18[) : a similar device is used by De Angelis, Hall, and Young (1993) in the context 
of Edgeworth expansion for median regression. The empirical distribution function (edf ) of Xi is not suited 
for this purpose, since it assigns point masses to sample points. However, making random draws from the 
edf and adding small noise with variance that is inversely proportional to the sample size produces draws 
from a smoothed empirical distribution function which is uniformly consistent with respect to Fx- 
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Second, take and Cq, as the sample a-quantiles of the samples {tp'Z*^{k),i = 1, ...,-B} 
and {ip'Z*^{k),i = 1,...,B}, respectively. We have found in numerical experiments that 
choosing M > 200 and B > 100 provides accurate estimates. 

5. Extreme Value vs. Normal Inference: Comparisons 

5.1. Properties of Confidence Intervals with Unknown Nuisance Parameters. In 

this section we compare the inferential performance of normal and extremal confidence 
intervals (CI) using the model: Yt = + Ut, t = 1,...,500, d = 7, Pj = 1 for j G 
{1, ...,7}, where the disturbances {Ut} are i.i.d. and follow either (1) a t distribution with 
£ {1,3,30} degrees of freedom, or (2) a Weibull distribution with the shape parameter 
a £ {1,3,30}. These distributions have EV indexes = l/u £ {1,1/3,1/30} and ^ = 
—1/a € {—1, —1/3, —1/30}, respectively. Regressors are drawn with replacement from the 
empirical application in Section 6.1 in order to match a real situation as closely as possible!^ 
The design of the first type corresponds to tail properties of financial data, including returns 
and trade volumes; and the design of the second type corresponds to tail properties of 
microeconomic data, including birthweights, wages, and bids. Figures 2 and 3 plot coverage 
properties of CIs for the intercept and one of the slope coefficients based on subsampling 
the SN-QR statistic with Bt = 200 and b = 100, and on the normal inference method 
suggested by Powell (1986) with a Hall-Sheather type rule for the bandwidth suggested in 
Koenker (2005)0 The figures are based on QR estimates for r E {.01, .05, .10, .25, .50}, i.e. 
tT £ {5,25,50,125,250}. 

When the disturbances follow t distributions, the extremal CIs have good coverage prop- 
erties, whereas the normal CIs typically undercover their performance deteriorates in the 
degree of heavy-tailedness and improves in the index tT. In heavy-tailed cases (^ £ {1,1/3}) 
the normal CIs substantially undercover for extreme quantiles, as might be expected from 
the fact that the normal distribution fails to capture the heavy tails of the actual distribution 
of the QR statistic. In the thin-tailed case (^ = 1/30), the normal CIs still undercover for 
extreme quantiles. The extremal CIs perform consistently better than normal CIs, giving 
coverages close to the nominal level of 90%. 

When the disturbances follow Weibull distributions, extremal CIs continue to have good 
coverage properties, whereas normal CIs either undercover or overcover, and their perfor- 
mance deteriorates in the degree of heavy-tailedness and improves in the index tT. In 



These data as well as the Monte-Carlo programs are deposited at www.mit.edu/vchern. 
The alternative options implemented in the statistical package R to obtain standard errors for the 
normal method give similar results. These results are available from the authors upon request. 
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heavy-tailed cases = —1) the normal CIs strongly overcover, which results from the 
overdispersion of the normal distribution relative to the actual distribution of QR statis- 
tics. In the thin-tailed cases = —1/30) the normal CIs undercover and their performance 
improves in the index tT. In all cases, extremal CIs perform better than normal CIs, giving 
coverage rates close to the nominal level of 90% even for central quantiles. 

We also compare forecasting properties of ordinary QR estimators and median-bias- 
corrected QR estimators of the intercept and slope coefficients, using the median absolute 
deviation and median bias as measures of performance (other measures may not be wcU- 
dcfincd). Wc find that the gains to bias-correcting appear to be very small, except in the 
finite-support case with disturbances that are heavy-tailed near the boundary. We do not 
report these results for the sake of brevity. 

5.2. Practicalities and Rules of Thumb. Equipped with both simulation experiments 
and practical experience, we provide a simple rule-of-thumb for the application of extremal 
inference. Recall that the order of a sample r-quantile in the sample of size T is the 
number tT (rounded to the next integer). This order plays a crucial role in determining 
whether extremal inference or central inference should be applied. Indeed, the former 
requires tT k whereas the latter requires tT oo. In the regression case, in addition 
to the number tT, we need to take into account the number of regressors. As an example, 
let us consider the case where all d regressors are indicators that equally divide the sample 
of size T into subsamples of size T/d. Then the QR statistic will be determined by sample 
quantiles of order rT/d in each of these d subsamples. We may therefore think of the 
number rT/d as being a dimension- adjusted order for QR. A common simple rule for the 
application of the normal law is that the sample size is greater than 30. This suggests we 
should use extremal inference whenever rT/d < 30. This simple rule may or may not be 
conservative. For example, when regressors are continuous, our computational experiments 
indicate that normal inference performs as well as extremal inference as soon as rT/d > 
15 — 20, which suggests using extremal inference when rT/d < 15 — 20 for this case. On 
the other hand, if we have an indicator variable that picks out 2% of the entire sample, as 
in the birthweight application presented below, then the number of observations below the 
fitted quantile for this subsample will be rT/50, which motivates using extremal inference 
when rT/50 < 15 — 20 for this case. This rule is far more conservative than the original 
simple rule. Overall, it seems prudent to use both extremal and normal inference methods 
in most cases, with the idea that the discrepancies between the two can indicate extreme 
situations. Indeed, note that our methods based on subsampling perform very well even in 
the non-extreme cases (see Figures 2 and 3). 
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6. Empirical Examples 

6.1. Extremal Risk of a Stock. We consider the problem of finding factors that affect the 
value-at-risk of the Occidental Petroleum daily stock return, a problem that is interesting 
for both economic analysis and real- world risk management o Our data set consists of 
1,000 daily observations covering the period 1996-1998. The dependent variable Yf is the 
daily return of the Occidental Petroleum stock and the regressors Xu, X2t, and X^t are 
the lagged return on the spot price of oil, the lagged one-day return of the Dow Jones 
Industrials index (market return), and the lagged own return Yt-i, respectively. We use 
a flexible asymmetric linear specification where Xt = (1, Xj^, with 
X+ = max{Xjt,0), Xr^ = -mm{XjuO) and j G {1,2,3}. 

We begin by stating overall estimation results for the basic predictive linear model. A 
detailed specification and goodness-of-fit analysis of this model has been given in Cher- 
nozhukov and Umantsev (2001), whereas here we focus on the extremal analysis in order to 
illustrate the new inferential tools. Figure 4 plots QR estimates /3(r) = (/3j(r),j = 0, ...,7) 
along with 90% pointwise confidence intervals. We use both extremal CIs (solid lines) and 
normal CIs (dashed lines). Figures 5 and 6 plot bias-corrected QR estimates along with 
pointwise CIs for the lower and upper tails, respectively. 

We focus the discussion on the impact of downward movements of the explanatory 
variables, namely X^^, X^, and X^, on the extreme risk, that is, on the low condi- 
tional/predicted quantiles of the stock return. The estimate of the coefficient on the negative 
spot price of oil, X^^, is positive in the lower tail of the distribution and negative in the 
center, but it is not statistically significant at the 90% level. However, the extremal CIs 
indicate that the distribution of the QR statistic is asymmetric in the far left tail, hence the 
economic effect of the spot price of oil may potentially be quite strong. Thus, past drops 
in the spot price of oil potentially strongly decrease the extreme risk. The estimate of the 
coefficient on the negative market return, is significantly negative in the far left tail 
but not in the center of the conditional distribution. Prom this we may conclude that the 
past market drops appear to significantly increase the extreme risk. The estimates of the 
coefficient on the negative lagged own return, are significantly negative in the lower half 
of the conditional distribution. We may conclude that past drops in own return significantly 
increase extreme and intermediate risks. 

Finally, we compare the CIs produced by extremal inference and normal inference. This 
empirical example closely matches the Monte-Carlo experiment in the previous section 

See Christoffersen, Hahn, and Inoue (1999), Diebold, Schuermann, and Stroughair (2000), Cher- 
nozhukov and Umantsev (2001), and Engle and Manganelli (2004). 
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with heavy-tailed t(3) disturbances. From this experiment, we expect that in the empirical 
example normal CIs would understate the estimation uncertainty and would be considerably 
more narrow than extremal CIs in the tails. As shown in Figures 5 and 6, normal CIs are 
indeed much more narrow than extremal CIs at r < .15 and r > .85. 

6.2. Extremal Birthweights. We investigate the impact of various demographic charac- 
teristics and maternal behavior on extremely low quantiles of birthweights of live infants 
born in the United States to black mothers of ages between 18 and 45. We use the June 1997 
Detailed Natality Data published by the National Center for Health Statistics. Previous 
studies by Abreveya (2001) and Koenker and Hallock (2001) used the same data set, but 
they focused the analysis on typical birthweights, in a range between 2000 and 4500 grams. 
In contrast, equipped with extremal inference, we now venture far into the tails and study 
extremely low birthweight quantiles, in the range between 250 and 1500 grams. Some of 
our findings differ sharply from previous results for typical non-extremal quantiles. 

Our decision to focus the analysis on black mothers is motivated by Figure 7 which 
shows a troubling heavy tail of low birthweigts for black mothers. We choose a linear 
specification similar to Koenker and Hallock (2001). The response variable is the birthweight 
recorded in grams. The set of covariates include: 'Boy,' an indicator of infant gender; 
'Married,' an indicator of whether the mother was married or not; 'No Prenatal,' 'Prenatal 
Second,' and 'Prenatal Third,' indicator variables that divide the sample into 4 categories: 
mothers with no prenatal visit (less than 1% of the sample), mothers whose first prenatal 
visit was in the second trimester, and mothers whose first prenatal visit was in the third 
trimester (The baseline category is mothers with a first visit in the first trimester, which 
constitute 83% of the sample); 'Smoker,' an indicator of whether the mother smoked during 
pregnancy; 'Cigarettes/Day,' the mother's reported average number of cigarettes smoked 
per day; 'Education,' a categorical variable taking a value of if the mother had less than 
a high-school education, 1 if she completed high school education, 2 if she obtained some 
college education, and 3 if she graduated from college; 'Age' and 'Age^,' the mother's age 
and the mother's age squared, both in deviations from their sample means0 Thus the 
control group consists of mothers of average age who had their first prenatal visit during 
the first trimester, that have not completed high school, and who did not smoke. The 
intercept in the estimated quantile regression model will measure quantiles for this group, 
and will therefore be referred to as the centercept. 



We exclude variables related to mother's weight gain during pregnancy because they might be simul- 
taneously determined with the birth-weights. 
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Figures 8 and 9 report estimation results for extremal low quantiles and typical quantiles, 
respectively. These figures show point estimates, extremal 90% CIs, and normal 90% CIs. 
Note that the centercept in Figure 8 varies from 250 to about 1500 grams, indicating the 
approximate range of birthweights that our extremal analysis applies to. In what follows, 
we focus the discussion only on key covariates and on differences between extremal and 
central inference. 

While the density of birthweights, shown in Figure 7, has a finite lower support point, 
it has little probability mass near the boundary. This points towards a situation similar to 
the Monte Carlo design with Weibull disturbances, where differences between central and 
extremal inference occur only sufficiently far in the tails. This is what we observe in this 
empirical example as well. For the most part, normal CIs tend to be at most 15 percent 
narrower than extremal CIs, with the exception of the coefficient on 'No Prenatal', for 
which normal CIs are twice as narrow as extremal CIs. Since only 1.9 percent of mothers 
had no prenatal care, the sample size used to estimate this coefficient is only 635, which 
suggests that the discrepancies between extremal CIs and central CIs for the coefficient on 
'No prenatal' should occur only when r < 30/635 = 5%. As Figure 9 shows, differences 
between extremal CIs and normal CIs arise mostly when r < 10%. 

The analysis of extremal birthweights, shown in Figure 8, reveals several departures from 
findings for typical birthweights in Figure 9. Most surprisingly, smoking appears to have 

no negative impact on extremal quantiles, whereas it has a strong negative effect on the 
typical quantiles. The lack of statistical significance in the tails could be due to selection, 
where only mothers confident of good outcomes smoke, or to smoking having little or no 
causal effect on very extreme outcomes. This finding motivates further analysis, possibly 
using data sets that enable instrumental variables strategies. 

Prenatal medical care has a strong impact on extremal quantiles and relatively little 
impact on typical quantiles, especially in the middle of the distribution. In particular, the 
impacts of 'Prenatal Second' and 'Prenatal Third' in the tails are very strongly positive. 
These effects could be due to mothers confident of good outcomes choosing to have a 
late first prenatal visit. Alternatively, these effects could be due to a late first prenatal 
visit providing better means for improving birthweight outcomes. The extremal CIs for 
'No-prenatal' includes values between and —800 grams, suggesting that the effect of 'No- 
prenatal' in the tails is definitely non-positive and may be strongly negative. 

Appendix A. Proof of Theorem 1 
The proof will be given for the case when ^ < 0. The case with ^ > follows very similarly. 
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Step 1. Recall that Vt = Yt — X[j3e and consider the point process N defined by N(i^) := 
llt=i l{(^T^,-^t) € F} for Borel subsets F oi E := [0, +oo) x X. The point process N converges 
in law in the metric space of point measure Mp{E), that is equipped with the metric induced by 
the topology of vague convergence. The limit process is a Poisson point process N characterized 
by the mean intensity measure TO]\j(F) := Jp(x''yy^^u~^/^dudFx{x). Given this form of the mean 
intensity measure we can represent 

oo 

N{F):=J2H{JuXt)€F} (A.20) 

for all Borel subsets F of E := [0, +oo) x X, where J* = {XIj) ■ Tf^, Tt = £i + ... + £t, for t > 1, 

{£t,t > 1} is an i.i.d. sequence of standard exponential variables, {Xi,t > 1} is an i.i.d. sequence 
from the distribution Fx- Note that when ^ > the same result and representation holds, except 
that we define Ji = —{XIj) ■ T^^ (with a change of sign). 

The convergence in law N N follows from the following steps. First, for any set F defined as 
intersection of a bounded rectangle with E, we have (a) limT^oo E'N{F) = mN(-F), which follows 
from the regular variation property of Fu and CI, and (b) limy^oo = 0] = e~"^^^^\ 
which follows by Meyer's (1973) theorem by the geometric strong mixing and by observing that 

y^g/feJ p(^AtVuXi) e FAAtVj,Xj) gF)< 0{T[T/k\P{{ATVi.Xi) e Ff) = Oil/k) by CI 
and C5. Consequently, (a) and (b) imply by Kallenberg's theorem (Resnick 1987) that N =J> N, 
where N is a Poisson point process N with intensity measure mN- 

Step 2. Observe that Zr(fc) := At{I3{t) - 13,,) = argmin^gjjd Yh=i PriArVt - X^z). To sec this 
define z := At (/3— /3e)- Rearranging terms gives X)tLi PriATVt — X^z) = — tTX^z— l{ATVt < 
Xtz)[ATVt — X[z) + Ylit=i tAtVi. Subtract Y^J^i tAtVi that does not depend on z and does not 
affect optimization, and define 

T 

Qt{z, k) := -tTX'j-z + V ^{AtVuX'^z) = -tTX'^z + / £(u, a;'z)dN(u, x), 

where £{u,v) := l{u < v){v — u). We have that Zr(fc)= argmin^^gd QT{z,k). 

Since £ is continuous and vanishes outside a compact subset of E, the mapping N t-^ i{u, x' z)dN{u, x), 
which sends elements N of the metric space Mp{E), to the real line, is continuous. Since tTXt 
kE[X] and N N, by the Continuous Mapping Theorem we conclude that the finite-dimensional 
limit of 2; Qt{z, k) is given by 

z ^ Qoo{z,k) := -kE[Xyz+ / e{j,x' z)dN{j,x) := -kE[X]'z + Y,^iJt,Xiz). 

•'^ i=l 

Next we recall the Convexity Lemma of Geyer (1996) and Knight (1999), which states that if 
(i) a sequence of convex lower-semicontinous function Qt '■ R'' ^ converges in distribution in 
the finite-dimensional sense to Qoo : K"^ — > M over a dense subset of W^, (ii) Qoo is finite over a 
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non-empty open set Zq C M'', and (iii) Qoo is uniquely minimized at a random vector Zoo, then any 
argmin of Qt, denoted Zt, converges in distribution to Z^o- 

By the Convexity lemma we conclude that ZT.{k) G argmin^ggd Qt{z, k) converges in distribution 
to Zoo{k) = argmin^ggd Qoo{z, k), where the random vector Zao{k) is uniquely defined by Lemma 
1 in Appendix E. 

Step 3. By CI, At (/5(t) - /3e) ^ k'^-f as tT ^ k and T ^ oo. Thus At0{t) - /3(t)) -^d 
Zoo{k) := Zoo{k) + fc-«7. Then 

OO 

Zoo{k) = Zoo{k) + fc-«7 = arg min[-A;^[X]'(z + k-^j) + V £(Jt, ^'/(z + fc-«7))]. 

Step 4. Similarly to step 2 it follows that 

{ZT{mk),ZT{k)) e argmin (^/_^,),gR2d QT{zi,mk) + Qt{z2, k) 
weakly converges to 

{Zoo{mk), Zoo{k)) = argmin (^^'^^^'^y^^^d Qoo{zi,mk) + Qoo{z2, k), 

where the random vectors Zoo{k) and Zoo(mfc) are uniquely defined by Lemma 1 in Appendix E. 
Therefore it follows that 

Mk),^) = U^^^^ E[XnZooikm)-Zooik)\ 



At J J ^ J- 

By Lemma 1 in Appendix E, E[X]' {Zoo{mk) — Zoo{k)) ^ a.s., provided that mk — k > d. It follows 
by the Extended Continuous Mapping Theorem that 



At X!j.{ZT{mk) - ZT{k)) E[X]'{Zoo{mk) - Z^^ik)) 

Using the relations Z^oik) = Z^oik) + k~^^ and Zoo{mk) = Zao{mk) + {mk)~^^ and E[X]'^ = 1 
holding by CI, we can represent 

Zoo{k) = . . □ 

E[X]'{Zoo{mk) - Zoo{k)) + (m-€ - 

Appendix B. Proof of Theorem 2 and 3 
The results follows by Theorem 1 and the definition of convergence in distribution. □ 

Appendix C. Proof of Theorem 4 and 5 

We will prove Theorem 4. The proof of Theorem 5 follows similarly. The main step of the 
proof, step 1, is specific to our problem. Let Gt{x) := Pr{VT < x} and G{x) := Pr{V^ < x} = 
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define 



limsupy^^ Gt{x). 

Step 1. Letting Vi^b^^ := Ai,b,Tip' {Pt,b,T{n) - /3(rJ^ 



i=l 



i=l 



wtiere Ab = \/Qu{^/b) is the canonical normalizing constant. Then 

l[Vi,6,r < X - Aifi^rWr/Ab] < l[Vi^b,T < a;] < l[Vi^b,T < x + Ai^b,TWT / Ab] 
for all i = 1, ...,Br, where wt = lAb^l;' {(3{n) - (3{n))\. 

The principal claim is that, under conditions of Theorem 3, wt = Op(l)- The claim follows by 
noting that for kj. = t^T — > fc > as b/T and T —^ oo, 



Ab X (^/3(r,) - /3(t,) 
/ 1 



2-^-1 
Qui2k^/b)-Qu{kT/b)'' " 



Qui2kr/b)~Quikr/b) 



krT 




(C.21) 



The first relation in (|C.2ip follows from two facts: First, by definition Ab := l/Qu{\/b) and by the 
regular variation of Qjj at with exponent — ^, for any / \ 0, Qu{l){2~^ — 1) ~ Qu{2l) — Qu{l). 
Second, since = k^/b and since x T = (krp/b) x T ^ i^/b) x T — > oo at a polynomial speed in T 
by r/6 — !■ oo at a polynomial speed in T by assumption, (3{t^) is the intermediate order regression 
quantile computed using the full sample of size T, so that by Theorem 3 in Chernozhukov (2005) 

hi2k^/b)-QuikT/by 



(C.22) 



Given that wt — Op(l), for some sequence of constants At \ as T — > oo the following event 
occurs wp 1 : 

l[V^i,f,,T < X - A,&,rAr/Af,] < l[Vi^b,T < X ~ Ai^b,TWT / Ab] 

< ^V^^b,T < X] 

< l[K:,f,,T < X + Ai^b,TWT / Ab] 



M-r = < 



< l[ViM,T KX + AiM^T^T/Ab 



for alH = 1 , . . . , Bt 



Event Mt implies 



G6,T(a;; At) < GbA^) < GbA^; -^t). 



(C.23) 



Step 2. In this part we show that at the continuity points of G{x), Gb_T(a;; ±At) G{x). 
First, by non- replacement sampling 



E[Gb,T{x; At)] = P[Vb - AbAr/Ab < x]. 



(C.24) 
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Second, at the continuity points of G(x) 

lim E[GbAx: At)] = lim P[Vb - AAt/A^ < a;] = P^Z^ik) < x] ^ G{x). (C.25) 

T , h 1 ^ ' 



h — 'oo 



The statement (|C.25p follows because VJ, — — Vb + Op(l) ~td ^p'Zoo{k), since by Theorem 1 

Vb — >(i tp'Zoo{k) and by the proof of Theorem 1 and by \ 



Ab 



= Opil) ■ At = Opil) ■ oil) = Op{l). 



Third, because Gb,T{x, At) is a U-statistic of degree b, by the LLN for U-statistics in Politis, 
Romano, and Wolf (1999), Var{Gb^Tix, At)) = o(l). This shows that Gb,T{x; At) G{x). By the 
same argument Gb.T{x', — At) -^p G{x). 

Step 3. Finally, since event M^- occurs wp 1 and so does (|C.23p . by Step 2 it follows that 
Gb.T{x) — »p G{x) for each x G R. Finally, convergence of distribution functions at continuity 
points, implies convergence of quantile functions at continuity points. Therefore, by the Extended 
Continuous Mapping Theorem, Cq = G'^^(a) -^p Ca = G~^{a), provided G~^{a) is a continuity 
point of G{x). □ 



Appendix D. Proof of Theorems 6 and 7 



We will prove Theorem 7; the proof of Theorem 6 follows similarly. 

We prove the theorem by showing that the law of the limit variables is continuous in the underlying 
parameters, which implies the validity of the proposed procedure. This proof structure is similar to 
the one used in the parametric bootstrap proofs, with the complication that the limit distributions 
here are non-standard. The demonstration of continuity poses some difficulties, which we deal with 
by invoking epi-convergence arguments and exploiting the properties of the Poisson process (jA.20[) . 
We also carry out the proof for the case with ^ < 0; the proof for the case with ^ > is identical 
apart from a change in sign in the definition of the points of the Poisson process, as indicated in the 
proof of Theorem 1. 

Let us first list the basic objects with which we will work: 

1. The parameters are ^ e (— oo,0), 7 G R'', and Fx G ^x, a distribution function on R'' with 
the compact support X. We have the set of estimates such that: 

sup \Fx{x) - Fx{x)\ ^p 0, f->p 7 7 as T ^ 00, (D.26) 
sex 

where Fx G ^x ■ The set Tx is the set of non- lattice distributions defined in Appendix [E] The 
underlying probability space {Q,J-', P) is the original probability space induced by the data. 

2. N is a Poisson random measure (PRM), with mean intensity measure rriN, and points repre- 
sentable as: {T~^ ■ X^j, Xj), j = 1, 2, 3, .... N is a random element of a complete and separable 
metric space of point measures {Mp{E), py) with metric pv generated by the topology of vague con- 
vergence. The underlying probability space (fi', !F\ P') is the one induced by Monte-Carlo draws of 
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points of N. This law of N in {Mp{E), pp) will be denoted as £(N|^, 7, Fx)- The law depends only 
on the parameters 7, Fx) of the intensity measure misr- 

3. The random objective function (ROF) takes the form 

z ^ Qoo{z; k) = -kE[Xy{z + k'^-f) + [ {x'{z + k'^^) - • x'j}+dN{u, x) 

Je 

(D.27) 

= -kE[x]'{z + fc-«7) + Ei-^t + ^"^^) - r*"^ • 

t=l 

and is a random element of the metric space of proper lower-semi-continuous functions (iC(M'^), Pe), 
equipped with the metric pe induced by the topology of epi-convergence. Geyer (1996) and Knight 
(1999) provide a detailed introduction to cpi-convergence, with connections to convexity and stochas- 
tic equi-semicontinuity. Moreover, this function is convex in z, which is a very important property 
to what follows. The law of z Qoo{z;k) in {LC{R'^), pe) will be denoted as £{Qoo{']k)\£^,^,Fx)- 
This law depends only on the parameters (^,7, Fx). 

4. The extremum statistic Z^{k) = SiVgrnm^^jfid Qao{z;k) is a random element in the metric 
space R"^, equipped with the usual Euclidian metric. The law of Zoo{k) in R"^ will be denoted as 
C{Zao{k)\^,'y, Fx)- This law depends only on the parameters (^,7, Fy). 

Next we collect together several weak convergence properties of the key random elements, which 
are most pertinent to establishing the final result. 

A. A sequence of PRM (N™,m = 1,2, ...) in {Mp{E),pp) defined by the sequence of intensity 
measures mN'" with parameters (^'", 7™, F™) converges weakly to a PRM N with intensity measure 
rriN with parameters (^, 7, Fx) if the law of the former converges to the law of the latter with respect 
to the Bounded-Lipschitz metric pyj (or any other metric that metrizes weak convergence): 

hm p^(£(N-|r,7™,^^r),^(N|^,7,^x))=0. (D.28) 

The weak convergence of PRMs is equivalent to pointwise convergence of their Laplace functionals: 

hm ^{f; N™) = ^(/; N), V/ e C+{E), (D.29) 

m — ^00 

where (E) is the set of continuous positive functions / defined on the domain E and vanishing 
outside a compact subset of E. The Laplace functional is defined as and equal to: 

(^(/;N) := [e-^E /(«.^) ^ g(-/^[l-e-«"'==>]cimNKa:))^ (D.30) 

B. A sequence of ROFs {Q™ (•; fc), m = 1, 2, 3, ...} defined by the sequence of parameters {(^™, 7™, F^ 
1, 2, 3, ...} converges weakly to the ROF Qoo{-', k) defined by parameters (^, 7, Fx) in the metric space 
{LC{M.'^), Pe), if the law of the former converges to the law of the latter with respect to the Bounded 
Lipschitz metric Pw{or any other metric that metrizes weak convergence): 

lim pUC{Q^{-,k)\r,7'^,F^)XiQoo{-,k)\^,%Fx)) = 0- (D.31) 
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Moreover, since the objective functions are convex in z, the above weak convergence is equivalent to 
the finite-dimensional weak convergence: 

(Q™(z,;fc),j = l,...,J)^<i(Ooo(^j;fc),j = l,...J) inK^ (D.32) 

for any finite collection of points {zj,j = 1, J). The result, that finite-dimensional convergence in 
distribution of convex functions implies epi-convergence in distribution, is due to Geyer (1996) and 
Knight (1999). Thus, in order to check (|D3T|) we only need to check (|D.32p . 

C. In turn, the weak convergence of objective functions {Q^(-; k), m = 1,2, 3, ...} to {Qoo{z-; k) in 
(LC(R'*), pe) implies that as m ^ oo the weak convergence of the corresponding argmins: Z^{k) -^d 
Z^{k) in W^, that is, 

lim pMZoo{k)\r.l'^,F^).i^{Z^{k)\^,-i,Fx))^Q. (D.33) 

m — >oo 

The proof is now done in two steps: 

I. We would like to show that the law C{Zoo{k)\£^' , 7', F'^) is continuous at (^', 7', F'^) — (^, 7, Fx) 
for each (^, 7, Fx) in the parameter space, that is, for any sequence (C™, 7™, -Fj? , m — 1,2,...) such 
that 

ir - ei ^ 0, 17"^ - 7l ^ 0, sup \F^{x) -Fx{x)\^Q (D.34) 

£CGX 

with e J-'x, we have 

pMZoo{k)\r,T,F^),^^{Z^{k)\^,-i,Fx))--0. (D.35) 

II. Given this continuity property, as |^ — ^| -^p 0, 17 — 7I — >p 0, sup^jgx \Fx{x) — Fx{x)\ 0, 
we have by the Continuous Mapping Theorem 

pMZ^{k)\lj,Fx),C{Z^{k)\C,^,Fx)) ^p 0. (D.36) 

That is, the law C{Zoo{k)\^,'-f, Fx) generated by the Monte Carlo procedure consistently estimates 
the limit law C{Zoo{k)\£,, 7, Fx), which is what we needed to prove, since this result implies that the 
convergence of respective distribution functions at the continuity points of the limit distribution. 
Convergence of the distribution functions at the continuity points of the limit distribution implies 
convergence of the respective quantiles to the quantiles of the limit distribution provided the latter 
are positioned at the continuity points of the limit distribution function. 

Thus it only remains to show the key continuity step I. We have that 

ijpm ^ mm iS ^nm M mm (iron) iS dnsai ^ dESSi) 

where (1) follows by direct calculations: for g{u,x) = 1 — e"-'^'"'^' and any / G Ck{E) 



!¥>(/; N™) - ^(/; N)| <^(/;N) 



exp I y g(u,x)[dm-f^ ~ dm-^^m]^ — 1 
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as {g{u,x) [drnN — (irnN™]) 0, which follows from the definition of the measure ton stated 
earlier; (2) follows by the preceding discussion in Step A; (3) follows by the continuity of the mapping 



from {Mp{E), py) to M, as noted in the proof of Theorem 1; and (4) and (5) follow by the preceding 



Define k := linij^^oo 'tT and fix an m such that k{m — 1) > d where d = dmi(X). 

Let {Xt,t > 1} be an i.i.d. sequence from a distribution function Fx such that E[XX'] is pos- 
itive definite. Define Gj := {kE[X\ - Et<j '^t)'{['^j+i-'^i+d]'}~^ if the matrix [Xj+i...Xj+d\' is 
invertible, and G := (oo, oo) otherwise. Denote by Tx{k) the class of distributions Fx for which 
Pfx {Gj e d{Q, lY) = for all integer j > 0. 

Definition (Non-Lattice Condition Given k and m). Fx € Tx{k') for both k' = k and k' = mk. 
Denote the class of all non-lattice distributions as Tx = ^x{k) fl Tx{mk). 

Lemma 1. If Fx e J-x, then Z^{k) and Zoo{k) are uniquely defined random vectors. Moreover, 
for any V' 7^ 0, ij) Z^{k) and tp'Zx{k) have continuous distribution functions. 

Comment E.l. Tlic non-lattice condition is an analog of Kocnkcr and Bassctt's (1978) condition for 
uniqueness of quantile regression in finite samples. This condition trivially holds if the nonconstant 
covariates X-u are absolutely continuous. Uniqueness therefore holds generically in the sense that 
for a fixed k adding arbitrarily small absolutely continuous perturbations to {Af-it} ensures it. 

Proof; Step 1. We have from Theorem 1 that Zao(k) = Zoo{k) + c for some constant c, where 
Zoo is defined in Step 2 of the proof of Theorem 1. Chernozhukov (2005) shows that a sufficient 
condition for tightness of possibly set- valued Zoo{k) is E[XX'] > 0. Taking tightness as given, 
conditions for uniqueness and continuity of Zao{k) can be established. Define H as the set of all 
rf-element subsets of N = {1,2,3,...}. For h G H, let X{h) and J{h) be the matrix with rows 
{Xt,t e h}, and vector with elements {Jt,t G h}, respectively, where Jt are defined in the proof of 
Theorem 1. Let H* = {h e H : \X{h)\ ^ 0}. Nota that H* is non-empty a.s. by E[XX'] positive 
definite and is countable. By the same argument as in the proof of Theorem 3.1. of Koenker and 
Bassett (1978) at least one element of Zac{k) takes the form Zh = X{h)^^ J{h) for some h G H*, 
and must satisfy a sub-gradient condition: 




discussion in Step C. 



□ 



Appendix E. Uniqueness and Continuity 



oo 



Ck{zh) := {kE[X] - ^Jt < Xlzh)Xt)'X{h)-^ € [0, 1] 



d 
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and the argmin is unique if and only if Ck{zh) G I? = (0, 1)''. By the same argument as in the proof 
of Theorem 3.4 in Koenker and Bassett (1978), Zh must obey 



oo 

k-d<Y,HJt< ^Izh) < k. 

t=i 



Then, uniqueness holds for a fixed k > ii P {3h e H* : Ck{zh) <= dV) ~ 0. To show this is the 
case, define Ai{j) as the set of all j-clcment subsets of N, and define for /i G A4{j), G(p,h) := 
{kE[X] — X^te^i '^*)''^('*)^^ is invertible, and G{iJL,h) := {oo,...,oo) otherwise. Now note 

that if Pfx {G{lJi, h) e dV} = for any h G H and /x e M.{j) such that hf\ ^ = $ and any integer 
j > 0, then 

P{3h&H* ■.Ck{zh)&dV) 

< P{G{ii,h) e dV,3h en,3ne M{j),3j > : hn /i = <il,k - d < j < k} 

^ E E E PFAG{i^,h)€dv} = o, 

ik-d)vo<j<k hen iJ.eM{i):hniJ.=$ 

since the summation is taken over the countable set. Finally, by the i.i.d. assumption and /ifl/i = 0, 
Pfx{G(/U,/i) e dD) = PpxiGj e dV}, where G, is defined above. Therefore, PpxiGj e &D} = 
for all integer j > is the condition that suffices for uniqueness. 

Step 2. Next want to show that the distribution function of ip'Zoo{k) has no point masses, 

that is P{ip' Zao(k) = .x} = for each .x G R and each vector 7^ 0, which is the equivalent 
to showing continuity of x 1-^ P{i/''Zoo(fc) < x} at each x. Indeed, from above we have that 
{Zoo(fc) = X{h)-^Jih)} for some h e H* a.s., and P{i^'X{h)-'^J{h) = x\{Xt,t > 1}} = for 
each h G H* a.s., since J{h) is absolute continuous conditional on {Xt,t > 1} and tp'X{h)~^ ^ 0. 
Therefore, 



J2 pwx{hr'j{h) = z\{Xt,t> 1}} 



0, 



P{Z^{k) =z}<E 

Ihen* 

by countability of H and the law of iterated expectations. 

Step 3. Next we want to show that the distribution function of ip'Zoo{k) has no point masses, 
that is P{ip'Zoo{k) = a;} = for each a; e M and each vector ip ^0. We have that 

Z^ik) = Vk{Z^{k) + c)/iE[X]'iZ^{mk) - Z^{k))) 

for some c ^ 0. From Steps 1 and 2 we have that solutions Zca{mk) and Zoo{k) are a.s. unique 
and a.s. take the form Z^{k) = Zh^ = X{hi)~^J{hi) and Zooijnk) = Zh2 = X(h2)^^J{h2) for some 
{h\,h2) gH* X H*. Furthermore, mk — k > d implies hi 7^ /12 and hence a.s. zj^ Zh^. Indeed, to 
see this by Step 2 we must have the inequality 

00 00 

k-d<Y^l{Jt < X^Zhi) ^ ^iid ™^ — <^ < < ^tZh2) ^ "^^) 

t=i t=i 
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SO that hi = /i2 implies mk — k < d. Furthermore, arguing similarly to (Bassett and Koenker 1982) 's 
Theorem 2.2., we observe that for hi and /12 defined above 



-kE[X\'zh, 



e{u,x'zh,)dN{u,x) - {mk - k)E[X]'zh^ 



< 



< 



< 



-kE[X]'zh, + / iiu,x'zh,)dN{u,x) - {mk - k)E[X]'zh, 

J E 

-mkE[X]'zh2+ / i{u,x'zh2)dN{u,x) 

J E 

-mkE[X]' Zhi + I £{u,x'zhi)dN{u,x) 

JE 

-kE[X]'zh, + I i{u,x'zh,)d^{u,x)-{mk-k)E[X]'zh^. 



Solving this inequality we obtain {mk — k)E[X]'{zh^ — Zhi) > 0. We conclude therefore that a.s. 
E[X]'{ZocimA) - Z^{k)) = E[X]'X{h2y\J{h2) - E[X]' X(hi)-\j{hi) > 0. Moreover, condi- 
tional on {Xit > 1} wc can show by a perturbation argmncnt that hi 7^ /12 must be such that 
E[X]'{Zoo{mk) — Zoo{k)) = c\J{h2) + C2J{hi \ /12) for some constant C2 0. Let us denote by G 
the set of all pairs hi ^ /i2 in H* x H* that obey these two conditions. 

Prom step 1 and from E[X]' {Zoc{mk) — Zoo{k)) > a.s., it follows that Zoc{k) is a proper random 
variable. Furthermore, for any x gM. a.s. for any (/ii, /12) S Q and S{hi, /12) = E[X]'X{h2)~^J{h2) — 
E[X]'X{hi)-^J{hi), P{i}'{X{hi)-^J{hi) + c)/S{hi, h2) = xr\ {hi X h2) G G\{Xt,t> 1}} = 0. The 
claim follows because, for any (hi, ^,2) G i/j' {X (hi)'^ J {hi) + c)/S{hi, /12) is absolutely continuous 
conditional on {Xt,t > 1} by xjyX{hi)^^J{hi) and S{hi,h2) being jointly absolutely continuous 
conditional on {Xt,t > 1} and by the non-singularity of transformation {w,v) [w + c]/v over 
region v > 0. Therefore, for any a; € M, P{ip' Zao{k) — x} is bounded above by 



E 



P{i^'x{hi)-^J{hi)/S{hi,h2) = zn{hix h2) e G\{Xu t > i}} 



0, 



by countability of H and the law of iterated expectations. 
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B. t(3)-dlsturbance: Coverage for Intercept 
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C. t(30)-disturbance: Coverage for Intercept 
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Figure 2 . Coverage of extremal confidence intervals and normal confidence 
intervals when Disturbances are t{y\v S {1,3,30}. Based on 1,000 repeti- 
tions. 
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F. Welbull(30)-dlsturbance: Coverage for Intercept 
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F. Welbull(30)-disturbance: Coverage for Slopes 
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Figure 3 . Coverage of extremal confidence intervals and normal confidence 
intervals when disturbances are Weibull (a), a G {1,3,30}. Based on 1,000 
repetitions. 
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Spot Oil Return (+) Spot Oil Return (-) 




Figure 4. QR coefficient estimates and 90% pointwise confidence intervals. 
The solid lines depict extremal confidence intervals. The dashed lines depict 
normal confidence intervals. 
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Figure 5. Bias-corrected QR coefficient estimates and 90% pointwise con- 
fidence intervals for r < .15. Tlie solid lines depict extremal confidence 
intervals. The dashed lines depict normal confidence intervals. 
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Figure 6. Bias-corrected QR coefficient estimates and 90% pointwise inter- 
vals for r > .85 . Tlie solid lines depict extremal confidence intervals. The 
dashed lines depict normal confidence intervals. 
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Figure 7. Birthweight densities for black and white mothers. 
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Figure 8. Bias-corrected QR coefficient estimates and 90% pointwise con- 
fidence intervals for r < .025 . The solid lines depict extremal confidence 
intervals. The dashed lines depict normal confidence intervals. 
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Figure 9. QR coefficient estimates and 90% pointwise confidence intervals 
for T E [.025, .975] . The sofid fines depict extremal confidence intervals. 
The dashed lines depict normal confidence intervals. 



